39 research outputs found

    The development and interlinkage of a drought vocabulary in the EuroGEOSS interoperable catalogue infrastructure

    Get PDF
    Metadata catalogues are used for facilitating the discovery of data and web services in, e.g., growing collections of Earth observation resources. Two conditions need to be met in order to successfully retrieve resources in catalogues: the metadata describing resources have to be complete and accurate and the keywords used in searches semantically related to the keywords contained in the metadata descriptions. One method to increase the rate of successfully retrieved metadata in catalogues is the use of controlled vocabularies. Such vocabularies can be used for annotating metadata with appropriate keywords and then also presented to users of the catalogue for specifying search terms. In the process of preparing metadata for drought-related data and services within the EuroGEOSS project, the need of a drought-specific vocabulary arose. This paper presents this drought vocabulary, the methodology followed for its development, its integration in the EuroGEOSS drought infrastructure and discusses its usefulness for the drought thematic area. The usefulness of the vocabulary is hereby measured by an increased use of search terms coming from an appropriate vocabulary and by an increase in the successful retrieval of resources. In particular, metadata must be annotated with appropriate keywords from a controlled vocabulary, thesaurus or ontology suitable for that particular field

    A software processing chain for evaluating thesaurus quality

    Get PDF
    Thesauri are knowledge models commonly used for information classification and retrieval whose structure is defined by standards that describe the main features the concepts and relations must have. However, following these standards requires a deep knowledge of the field the thesaurus is going to cover and experience in their creation. To help in this task, this paper describes a software processing chain that provides different validation components that evaluates the quality of the main thesaurus features

    A method for checking the quality of geographic metadata based on ISO 19157

    Get PDF
    With recent advances in remote sensing, location-based services and other related technologies, the production of geospatial information has exponentially increased in the last decades. Furthermore, to facilitate discovery and efficient access to such information, spatial data infrastructures were promoted and standardized, with a consideration that metadata are essential to describing data and services. Standardization bodies such as the International Organization for Standardization have defined well-known metadata models such as ISO 19115. However, current metadata assets exhibit heterogeneous quality levels because they are created by different producers with different perspectives. To address quality-related concerns, several initiatives attempted to define a common framework and test the suitability of metadata through automatic controls. Nevertheless, these controls are focused on interoperability by testing the format of metadata and a set of controlled elements. In this paper, we propose a methodology of testing the quality of metadata by considering aspects other than interoperability. The proposal adapts ISO 19157 to the metadata case and has been applied to a corpus of the Spanish Spatial Data Infrastructure. The results demonstrate that our quality check helps determine different types of errors for all metadata elements and can be almost completely automated to enhance the significance of metadata

    Blur2sharp: A gan-based model for document image deblurring

    Get PDF
    The advances in mobile technology and portable cameras have facilitated enormously the acquisition of text images. However, the blur caused by camera shake or out-of-focus problems may affect the quality of acquired images and their use as input for optical character recognition (OCR) or other types of document processing. This work proposes an end-to-end model for document deblurring using cycle-consistent adversarial networks. The main novelty of this work is to achieve blind document deblurring, i.e., deblurring without knowledge of the blur kernel. Our method, named “Blur2Sharp CycleGAN, ” generates a sharp image from a blurry one and shows how cycle-consistent generative adversarial networks (CycleGAN) can be used in document deblurring. Using only a blurred image as input, we try to generate the sharp image. Thus, no information about the blur kernel is required. In the evaluation part, we use peak signal to noise ratio (PSNR) and structural similarity index (SSIM) to compare the deblurring images. The experiments demonstrate a clear improvement in visual quality with respect to the state-of-the-art using a dataset of text images

    Discrete Global Grid Systems with quadrangular cells as reference frameworks for the current generation of Earth observation data cubes

    Get PDF
    Discrete Global Grid Systems are spatial reference frameworks that associate information to multi-resolution grids of uniquely identified cells; they are proposed as mechanisms to facilitate the efficient integration of heterogeneous spatial data. They could provide an excellent reference system for Earth observation data cubes, technological infrastructures that provide analysis-ready access to Earth Observation big data, as long as they can be made compatible with them. In this paper, we demonstrate that this is currently feasible without requiring new technological developments. We show how a Discrete Global Grid System with quadrangular cells, rHEALPix, and an existing data cube platform, Open Data Cube, can be integrated without loosing the advantages of having all the data in a Discrete Global Grid System, while keeping a straightforward access to all of the analysis tools provided by an Earth Observation Data Cube

    Approaches for the clustering of geographic metadata and the automatic detection of quasi-spatial dataset series

    Get PDF
    The discrete representation of resources in geospatial catalogues affects their information retrieval performance. The performance could be improved by using automatically generated clusters of related resources, which we name quasi-spatial dataset series. This work evaluates whether a clustering process can create quasi-spatial dataset series using only textual information from metadata elements. We assess the combination of different kinds of text cleaning approaches, word and sentence-embeddings representations (Word2Vec, GloVe, FastText, ELMo, Sentence BERT, and Universal Sentence Encoder), and clustering techniques (K-Means, DBSCAN, OPTICS, and agglomerative clustering) for the task. The results demonstrate that combining word-embeddings representations with an agglomerative-based clustering creates better quasi-spatial dataset series than the other approaches. In addition, we have found that the ELMo representation with agglomerative clustering produces good results without any preprocessing step for text cleaning

    Revisión de la calidad de los conjuntos de datos abiertos sobre presupuestos

    Get PDF
    En este trabajo se presentan los resultados de la evaluación de la calidad de los conjuntos de datos abiertos sobre presupuestos disponibles en España. Para llevar a cabo la comparativa de evaluación se ha adoptado la Metodología de Evaluación de la Calidad de los Metadatos propuesta por el Portal de Datos Europeo (MQA). Se ha adaptado una metodología automática que aplica las cinco dimensiones de MQA separadas por la propiedad espacial y que es capaz de generar gráficas de descripción del conjunto de metadatos y otras gráficas comparativas siguiendo el ejemplo del ranking existente en el portal de MQA. Los resultados indican que, a pesar de las diferentes entidades que elaboran los metadatos, todos ellos alcanzan una puntuación similar limitada únicamente por la norma que define el diseño del portal de datos abiertos en España. In this work, we present the results of quality evaluation of budget open datasets in Spain. To achieve this quality evaluation we have applied the Metadata Quality Assurance (MQA) methodology proposed to check the European open data portal. Following this, a methodology to test the five dimensions of MQA grouped by the spatial property has been developed. In addition, an automatic procedure to create comparative graphs, the first describing the spatial property of the corpus and the second following the MQA ranking. The results show that, even with some different dataset (and metadata) producers, the MQA value is similar in all the cases and are mainly limited by the policy that defines the design of open data portal in Spain

    Using a hybrid approach for the development of an ontology in the hydrographical domain

    Get PDF
    This work presents a hybrid approach for domain ontology development, which merges top-down and bottom-up techniques. In the top-down approach the concepts in the ontology are derived from an analysis and study of relevant information sources about the domain (e.g., hydrographic features). In the bottom-up approach the concepts in the ontology are the result of applying formal methods on a analysis of the data instances on the repositories (e.g., repositories containing hydrographical features)

    A hierarchical one-to-one mapping solution for semantic interoperability

    Get PDF
    The importance of interoperability among computer systems has been progressively increasing over the last years. The tendency of current cataloguing systems is to interchange metadata in XML according to the specific standard required by each user on demand. According to the research literature, it seems that there exist two main approaches in order to tackle this problem: solutions that are based on the use of ontologies and solutions that are based on the creation of specific crosswalks for one-to-one mapping. This paper proposes a hierarchical one-to-one mapping solution for improving semantic interoperability
    corecore